在拒绝的环境中进行搜索对于群体机器人来说是具有挑战性的,因为不允许GNSS,映射,数据共享和中央处理的帮助。但是,使用嗅觉和听觉像动物一样合作可能是改善群体合作的重要方法。在本文中,提出了一群自主机器人来探索拒绝环境的嗅觉审计算法算法(OA-BUG)。构建了一个模拟环境,以衡量OA-BUG的性能。使用OA-BUG的搜索任务覆盖范围可以达到96.93%,与类似的算法SGBA相比,最大的40.55%提高了40.55%。此外,在实际的群机器人上进行了实验,以证明OA-BUG的有效性。结果表明,OA-BUG可以在被拒绝的环境中改善群体机器人的性能。
translated by 谷歌翻译
最近,注意机制已成功应用于基于神经网络的说话者验证系统。将挤压和兴奋的块纳入卷积神经网络中的表现出色。但是,它使用全球平均池(GAP)简单地沿时间和频率维度平均功能,这无法在功能地图中保留足够的扬声器信息。在这项研究中,我们表明GAP是时间频域在数学上仅使用频率分解中最低频率分量的特殊情况。为了增强扬声器信息提取能力,我们建议利用多频信息,并设计两个新颖的有效注意模块,称为单频率单通道(SFSC)注意模块和多频单通道(MFSC)注意模块。提出的注意模块可以根据DCT有效地从多个频率组件中捕获更多扬声器信息。我们在Voxceleb数据集上进行了全面的实验,并对第148个UTD法医语料库进行了探测评估。实验结果表明,我们提出的SFSC和MFSC注意模块可以有效地产生更具歧视性的扬声器表示,并且优于RESNET34-SE和ECAPA-TDNN系统,而EER降低了20.9%和20.2%,而无需添加额外的网络参数。
translated by 谷歌翻译
The logit outputs of a feedforward neural network at initialization are conditionally Gaussian, given a random covariance matrix defined by the penultimate layer. In this work, we study the distribution of this random matrix. Recent work has shown that shaping the activation function as network depth grows large is necessary for this covariance matrix to be non-degenerate. However, the current infinite-width-style understanding of this shaping method is unsatisfactory for large depth: infinite-width analyses ignore the microscopic fluctuations from layer to layer, but these fluctuations accumulate over many layers. To overcome this shortcoming, we study the random covariance matrix in the shaped infinite-depth-and-width limit. We identify the precise scaling of the activation function necessary to arrive at a non-trivial limit, and show that the random covariance matrix is governed by a stochastic differential equation (SDE) that we call the Neural Covariance SDE. Using simulations, we show that the SDE closely matches the distribution of the random covariance matrix of finite networks. Additionally, we recover an if-and-only-if condition for exploding and vanishing norms of large shaped networks based on the activation function.
translated by 谷歌翻译
经典地,连续时间兰富文队扩散在唯一的假设下迅速迅速迅速迅速迅速,以至于$ \ PI $满足POINCAR的不平等。使用这一事实来为离散时间Langevin Monte Carlo(LMC)算法提供保证,因此由于需要与Chi Squared或R \'enyi分歧的需要,并且在很大程度上主要重点关注日志凹形目标。在这项工作中,我们为LMC提供了第一个收敛保证,假设$ \ PI $满足Lata {\ l} a - oleszkiewicz或修改的log-sobolev不等式,它在Poincar \ e和log-sobolev设置之间插值。与现有作品不同,我们的结果允许弱滑性,并且不需要凸起或耗散条件。
translated by 谷歌翻译
没有发言者标签的培训扬声器 - 识别和强大的发言者验证系统仍然挑战和值得探索。在这项研究中,我们提出了一种有效的自我监督的学习框架和一种新的正规化策略,以促进自我监督的发言者代表学习。不同于基于对比的自我监督的学习方法,所提出的自我监督正则化(SSREG)专注于正数据对潜在的潜在表示之间的相似性。我们还探讨了替代在线数据增强策略对时域和频域的有效性。凭借我们强大的在线数据增强策略,所提出的SSREG显示了自我监督学习的潜力,而不使用负对对,它可以显着提高自我监督扬声器表示学习与简单的暹罗网络架构的表现。 VOXECEB数据集的综合实验表明,我们提出的自我监督方法通过增加有效的自我监督正则化和胜过其他以前的作品来获得23.4%的相对改善。
translated by 谷歌翻译
我们提出了一种基于langevin扩散的算法,以在球体的产物歧管上进行非凸优化和采样。在对数Sobolev不平等的情况下,我们根据Kullback-Leibler Divergence建立了有限的迭代迭代收敛到Gibbs分布的保证。我们表明,有了适当的温度选择,可以保证,次级最小值的次数差距很小,概率很高。作为一种应用,我们考虑了使用对角线约束解决半决赛程序(SDP)的burer- monteiro方法,并分析提出的langevin算法以优化非凸目标。特别是,我们为Burer建立了对数Sobolev的不平等现象 - 当没有虚假的局部最小值时,但在鞍点下,蒙蒂罗问题。结合结果,我们为SDP和最大切割问题提供了全局最佳保证。更确切地说,我们证明了Langevin算法在$ \ widetilde {\ omega}(\ epsilon^{ - 5})$ tererations $ tererations $ \ widetilde {\ omega}(\ omega}中,具有很高的概率。
translated by 谷歌翻译
Compressed videos often exhibit visually annoying artifacts, known as Perceivable Encoding Artifacts (PEAs), which dramatically degrade video visual quality. Subjective and objective measures capable of identifying and quantifying various types of PEAs are critical in improving visual quality. In this paper, we investigate the influence of four spatial PEAs (i.e. blurring, blocking, bleeding, and ringing) and two temporal PEAs (i.e. flickering and floating) on video quality. For spatial artifacts, we propose a visual saliency model with a low computational cost and higher consistency with human visual perception. In terms of temporal artifacts, self-attention based TimeSFormer is improved to detect temporal artifacts. Based on the six types of PEAs, a quality metric called Saliency-Aware Spatio-Temporal Artifacts Measurement (SSTAM) is proposed. Experimental results demonstrate that the proposed method outperforms state-of-the-art metrics. We believe that SSTAM will be beneficial for optimizing video coding techniques.
translated by 谷歌翻译
As one of the most important psychic stress reactions, micro-expressions (MEs), are spontaneous and transient facial expressions that can reveal the genuine emotions of human beings. Thus, recognizing MEs (MER) automatically is becoming increasingly crucial in the field of affective computing, and provides essential technical support in lie detection, psychological analysis and other areas. However, the lack of abundant ME data seriously restricts the development of cutting-edge data-driven MER models. Despite the recent efforts of several spontaneous ME datasets to alleviate this problem, it is still a tiny amount of work. To solve the problem of ME data hunger, we construct a dynamic spontaneous ME dataset with the largest current ME data scale, called DFME (Dynamic Facial Micro-expressions), which includes 7,526 well-labeled ME videos induced by 671 participants and annotated by more than 20 annotators throughout three years. Afterwards, we adopt four classical spatiotemporal feature learning models on DFME to perform MER experiments to objectively verify the validity of DFME dataset. In addition, we explore different solutions to the class imbalance and key-frame sequence sampling problems in dynamic MER respectively on DFME, so as to provide a valuable reference for future research. The comprehensive experimental results show that our DFME dataset can facilitate the research of automatic MER, and provide a new benchmark for MER. DFME will be published via https://mea-lab-421.github.io.
translated by 谷歌翻译
Face Anti-spoofing (FAS) is essential to secure face recognition systems from various physical attacks. However, recent research generally focuses on short-distance applications (i.e., phone unlocking) while lacking consideration of long-distance scenes (i.e., surveillance security checks). In order to promote relevant research and fill this gap in the community, we collect a large-scale Surveillance High-Fidelity Mask (SuHiFiMask) dataset captured under 40 surveillance scenes, which has 101 subjects from different age groups with 232 3D attacks (high-fidelity masks), 200 2D attacks (posters, portraits, and screens), and 2 adversarial attacks. In this scene, low image resolution and noise interference are new challenges faced in surveillance FAS. Together with the SuHiFiMask dataset, we propose a Contrastive Quality-Invariance Learning (CQIL) network to alleviate the performance degradation caused by image quality from three aspects: (1) An Image Quality Variable module (IQV) is introduced to recover image information associated with discrimination by combining the super-resolution network. (2) Using generated sample pairs to simulate quality variance distributions to help contrastive learning strategies obtain robust feature representation under quality variation. (3) A Separate Quality Network (SQN) is designed to learn discriminative features independent of image quality. Finally, a large number of experiments verify the quality of the SuHiFiMask dataset and the superiority of the proposed CQIL.
translated by 谷歌翻译
Interview has been regarded as one of the most crucial step for recruitment. To fully prepare for the interview with the recruiters, job seekers usually practice with mock interviews between each other. However, such a mock interview with peers is generally far away from the real interview experience: the mock interviewers are not guaranteed to be professional and are not likely to behave like a real interviewer. Due to the rapid growth of online recruitment in recent years, recruiters tend to have online interviews, which makes it possible to collect real interview data from real interviewers. In this paper, we propose a novel application named EZInterviewer, which aims to learn from the online interview data and provides mock interview services to the job seekers. The task is challenging in two ways: (1) the interview data are now available but still of low-resource; (2) to generate meaningful and relevant interview dialogs requires thorough understanding of both resumes and job descriptions. To address the low-resource challenge, EZInterviewer is trained on a very small set of interview dialogs. The key idea is to reduce the number of parameters that rely on interview dialogs by disentangling the knowledge selector and dialog generator so that most parameters can be trained with ungrounded dialogs as well as the resume data that are not low-resource. Evaluation results on a real-world job interview dialog dataset indicate that we achieve promising results to generate mock interviews. With the help of EZInterviewer, we hope to make mock interview practice become easier for job seekers.
translated by 谷歌翻译